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ABSTRACT 

* .This paper investigates the relationship between a 
mdlti£actor grading system and standardized achievement test scores. 
The study attempts to measure not only achievement but also 
motivation and rate of progress. Two hypotheses are tested: (1) 
Teacher measures of application, i^aprovement, and grade level do not 
bear a significant relationship to standardized test scores in 
mathematics. (2) Teacher measures oE application and improvement do 
not ac|d, significantly, in predictian to that provided by the grade 
level Ind ^lass. The procedure of the study is described in ligiit^f 
the. definition of terms, sample, instrument, design, and data * 
analysis. It is clear f ro.m the data that the grades at this private 
day school bear a strong relationship to achievement 'test scores and 
are a good predictor of them. However, neither application nor 
improvement, as measured, added- sigi^ficantly to the straight 
achieviement measure. The reasons for, and implications of, this 
result' are explained.. (SB) ^ ' 



0 



c-.i 



PREMCTIVE ABILITY OF A MtJLTl -FACTOR* 
GRADING SYSTEM - 



John E. Bailey III 




1 



Prepared For: 



Dr. James Montgomery 
North Broward Day School 




US DEPAWTMENTOF MCALTM. 
EDUCP lONAWELFMftE . . 
NATIONAL INSTITUTE OF 
LDUC^ION 

TH.^ OOCviVt MAS BtEN REPftO 
aiCtO EJlACTiv AS RECEIVED FROM 
THt Pt RSON OR ORGANISATION ORIGIN 
A^ilSlf, iT PO'NTSO* Vl6l/V OR OPINIONS 
STATf O DO NOT N iCeSSARlLY REPRE 

SE 0* f t al national institute Of 
ED'vJC^-'ON PQSitiCN OR POLiCv 



1 July 1974 
NOVA UNIVERSITY 
Port Lauderdale, Florida 



^ ,^ . The purpose of this study is to investigate the 
relationship between a murlti-factor grading system and 
standardized j achievement teat scores, in recent years num- 
erous segments of the population have attacked t^he traditional 
A-F grading system employed by the schoo^.s (Zimmerman, 1970? 
Miller, 1967? Glasser, 1969; and Milton, 1972) . The criti- 
cisms of the traditional system are numerous, but one of the. 

t 

central criticisms has been that grades attempt to measure* 
with one score several important :and partially independent 
-dimensions of behavior (Driscol3^ 1972). There' are a number 
of possible responses' to the problems that the critics of the 

t 

traditional system have surfaced. The most typical response 
has bee^ to reduce the distinctions that are attempted, 
as in t^ej various pass-fail systems. Other alternatives, 
of this type have ranged from a computerized prose evalu- 

» ■ - • • 

atiqns system (Giannangelo, 1974) to a system of only re- 
cording completes as students master behaviorial objectives 
(Zimmerman, 1970) • A conceptual alternative to this strategy 
of making fewer distinctions is to attempt to measure the 
other important dimensions of the . student in school. This 
^alternative has been attempted by a private, day school. It 

/^attempts to measure net only achievement, but motivation and 

"J . . ■ 

rate of progress as well. . 
RELAT ED LITERATURE 

The study of the validity of school grades has had mixed 
results. A number of studies have .shown that thes high school 
grade point average is an excellent predictor of college 
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grades (Wilson, 1970? Farver, 1973: McCausland, 1974? and 
Wilson, 1971) . These studies have shown this to be true for 
a variety of populations. By contrast Hoyt (19£6) .in a re- 
view of the literature dating back to 1917 has found that 
college grades bear little dr no relationship to adult 
success as measured by a number of variables. Jackson and 
Lahaderne <1967) found only a. small relatiohship between 
teacher awarded grades and standardized test 'scores in the 

« 

sixth grade. Of -course, all the above outlined studies 

.- .. • > 

were based on grades as defined by the traditional A-F 
systeiti and thus do not .attempt to measure separately 
^motivation or *rate of improvement. 
HYPOTHESES • ' - 

This study, wa^ designed to investigate two hyp6theses. ' 
The null hypotheses are: 

I. Teacher measures of application, improyemerit, and 
grade level, do not bear a significant relation to steindardized 
test scores in mathematics. 

. If hyt)6thesis I is rejected, it is legitimate to test 
the hypothesis that % 

II. Teacher measures of application and improvefaent 
do not add significantly in prediction to that . provided 
by the grade level and class. 

Other questions of ^ interest include: 

1. Of the available teacher measures which set provides 
the best prediction combined with the fewest predictors? 
2- Is there any difference in predictLon between 

♦ 

the prediction of the total math scores and subscores for - 



reasoning and computation? 
PROCEDURE 
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Definition of Terms. Appendix A contains an example 
of thfi grad^ card that is used in this system. It is 
cbmprise^!*of four components, a|)pli.cation, improvement, 
grade level, and conduct. Application is graded on a five 
point .scale aiong a dimension froln seeJcs independent work 
to will not work, improvement is a fout point scale from 
accelerated to none. Grade level is a ti.ree point scale 
from above grade level to be low grade ifc>rel^ Conduct is 
a satisfactory-unsatisfactory dichotomy. 

Sample. The sample .consisted of , 31 white, > middle 
socio-economic status children from a small, private 
day school in northern Broward County,^ Florida. Students 
were from 4th, 5th and 6th grades. Classes 'were small 
witli 9, 8, and i4 students in each class respectively. The • 
scl\ool emphasizes a modern, concept oriented approach' to 
the study of mathematics. * Sexes were approximately balanced. 

t Instrument. The standardized test used was the 
California Achievement Test, 1957 edition. ',It is widely used 
aKd has been favorably revie^»ed as to reliability and 
validity (Neidt, 1957). It consists of 11 scores, a^^cluding 
mathematics reasoning, fundamentals and total, the scales 
used in this study. * . 

V 

Design." The school operates on a nine month school 
year divided into four 9 week quarters. The grades used 
were from the third quarter which ended in the last week 
of March 1973.. Grades were awarded normally by the teachers. 
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SiiTqe'this stuSj^ was not ^ven contemplated at that time, . . 
the subjects could not have been influenced l^y its existence. 
' The achtevemei^ t^st was administered ill the second week 

^ of April 197 3 'under the supe'rvisioh .of a Ph.D. psychologist 
* and scored by th^distribu^or. .Grades from the quarter 

'preceeding the Administration 0/4^6 achievement test were 
used to predict its results. 

Data Analysis. The data was analyzed via the ROL- 
Regression Aifai'ysis program* tiding both generation^d ^ 
transformation 'pf variabiles; This is the 1 September 1969 

t • • * • 

version compiTeA by G^o^ Peabody Computer Center. 
* - ■ Since it was not" obvious that the various teacher 

» 

measuifes can be- assumed to be linear, all could be tested 

• for linearity using the appropriate fi\ll and restricted 
^-^dels. . Due to the. limited, range of'bie variables^ ohiy . 

• grade level and class were so tested. Based on the results 
^ of that 'test, all other' variables were assumed to be linear. 

* . Linear in ter act ion r Were tested for grade level,- improvement, 
'and application with cla'ss.'^ Various Alternative models 
were Vested to determine which model was roost parsimonious 
^wit^o.ut being significantly poorer in prediction. Due 
to the small sample siae, no 'attempt was -made to determine 
if moderator groups existed. 

R ESULTS X 

For Hypothesis I, as, can be seen from Table 1. all 

• tested mode;Ls were significantly' better predictots than 
chance "^ccountiit^.for from 62% *to 42% of thq variance. Thus 
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Hypothesis. I is rejected and it *is appropriate to -test 
Hypothecs II. » .* 

INSERT TABLE 1 ABOUT HERE ^x-"""*^ 
>In terms of Hypothesis II, neither application or 
improvement scoresjior their interaction with class add 
sigrtif icant prediction to that provided by class and 
gjpade level. These have F-ratios of .102 and 2.415 
and resulting in probabilities of .75 and .13 respectively. 
These .results are contained in Table 2. v . . 

iriSEBT TABLE 2 ABOUT HERE • 
Table 3 contains the relative contribution by each* 
of the; predictors to the predict x equation. The contributions 

" * • * .J • 

by each pre<y.ctbr coincide with their use in the obtained 
prediction eq>^tions except for the non-inclusion of t)ie 
interaction term« f or improvement and class*' This non-inclusion 
can be explained by the very high correlation betvfeen this 
T:erms and the improvement score (r * .94). Its contribution 
is entirely included within the improvemeot score and thus 
it adds nothing to prediction, The prediction from the . 
various models plus the number o^ predictor^ used, is con- 
tained in Xable 1. ' 

r * 

INSERT TABLE 3 ABOUT HERE * * 

■ • ' ■ : ' . \ ' 

' , In examining the question of whether there was any 

*■■.**-. 

difference between the prediction <>^f total score and 

concept and computation scores, important differences were 

found. Table 4 illustrates th^it all but tw.o of the. correlations 

between the predictors and the concept sc^re are lower than " 

those <for total score and computation scores » Prediction . 

.'INSERT TAi3LE 4 ABOUT HERE 
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was also '.lower r • ranging from a. t of 37% to 23%. 

* • . . • —- • * " • 

• DISCUSSION . • - . 

It ist. clear from the data presented here that the 
grades at this school bea \ strong relationship' to achieve- 
ment te'st scores and are tjood predictors of. them. Within 
the system ejaployed^ the teachers 'perceptions of academiS: / 
achievement in terms of t^.tal math scores are probabjLy . . 

• ■ ■ ^ ■ . • . ♦ * . » 

close to exhausting the-i>redictive-~a±»ility-of --tte-^ys^tem^— ^^^^^^^^^^^ 

^ 

However, in this sample, neither application nor 

» 

improvemeijt as measured, added significantly to the itra^ght* 
achievement measur^. / This is not to say that thry are 
not, useful or valid measures. Neither^are designed to 
measure math achievement but are (designed to measure ,j 
the student^' " independence and impr^ovement . To the^ .extent 
that these are'^f ef lected in higher achieVemi^nt'Tnd th^^ 
higher achieWn'ent is accurately measured Jy the grade l^el 
score, neitherlwould be expected to add to prediction . '» 
Whether they are useful measures must thusystand on their 
reliability and validity in terms of th&d/ original intention. 
The extent to which .they are successful i.n this regard 
cannot be determined* by this study. 

However, for imprdVement there is some information 
that tempts one to believe that it is not so much a 'measure 
' of improvement, as of achievement. It 'is reported to parents 
the form 'Of the letters, A-D. This alone would be a .power-/ 
fill *sugg<^stion, to revert to the traditional syst^ in so * 
far ^8 this particular me?.sure is concerned* The fact that 



t 



is 



the .single best predictor, of academic achievement 
reinforces this idea. • 

Attempting to resolve. the question of which predictor 
itjodel is best presents a ntlmber of difficult tradeoffs! It 
probably unnecessary to resolve this issue, other than 
saying that some combination of these va'riables, treatisd 
as >being contimous, will accoui^ for between 40-50% of ^ ' 
the-variance> For the purpose of reporting their children's 

* o * 

* I 

achievement to parents, any of these relationships "are / 
s*tromg enough. Equally if one -wishes to research and ., 
do^s not have the' relevant standaijdized tests, this data 
would suggest that grades are ^tongly enough related tor- • 
be a practical alternative. 

'In comparing the traditional one grade model to 
this model which adds improvement and application scores, 
the outcome is complicated. Statistically nonsignificant 

* 9 

differences were found. However the difference in ^^/iariance- ^ 
accounted for is fairly large (6% of the variance). Again' 
since fine di-sc rim in'at ions are not" typically made on the 

baisis of , these scores, either model is. acceptable. If any 

• * ■ 0 

important decisions are. being made, then it would probably 

be- wise to use the fullest possible model. To the extent 

tliese a^dditional scores add -information about areas other 

f. ' 
than academic achievement, they may. well be useful* and 

important and justifiable on that basis. 

In terms qf the school's stated emphasis on the learning 

of mathematical concepts, a discrepancy was found. The measure- 

♦ * 

ment criteria are weighted f ar ^ore strongly toward computatioaal 



skills than to reasoning or conceptual skills. In terms 

of achievement test scores, the students are in fact 

doing better in math reasoning thao in computation » so 

■ . • . • p 

the teaching emphasis may be there. But this success is 

not being reflected as acciirately ii\ the various grades 

as is computational ability. * . ^ 

^Turning to the Wider questions on which this ( study 

ha^^earing,. there are two. One' is the relation between 

teacher fheasures and student achievement. Cles^rly ^this 

study supports the idea that teacher measures are very 
« 

clearly .related to student^ scores on standardized tests in this 
schooil. Of course, stajidardized tests are not th^ same 

thin^ as student achievement.' Students inevitably get a 
lot more, good and bad, out of the classroom than the 
specific subjects that they ar^ taught. But there is strong 
evidence that standardized tests d6 in iEact measure fairly ' 
\\rell the degree to which the student . has >^learned ^the skills 
and subjects that are explicitly being ^ught. thus this 
study supports the idea that' teacher measures do bear a 
strong relationship to student achievement in the formal 
subjects in the Curriculum. • 

This brings up. the second question. How does one 
explain the low Or non-existent relationships between 
grades and measures qf non -academic adult achievement. One 
possibility is to say tnat grades don*t 'really measure 

w^at has been taught. Tn^s study for the reasons mentioned 

*/ ' * ' . , 

^ovp does net support thSls contention. Grades are reasonably 



gdbd measures of what is in the curriculum. This leaves 

* * * » 

open the question .of what then tl>e schools do 'contribute. 

This study can* add nothing .to answering that question 

except to suggest that indicting the grading system is 
"*< • 

not the answer. Rather it, suggests that the an ??i*er roust * 
be found by examining' what is taught and in what way it 
should and does contribute to a successful life^ after 
school r however 'CTl^t may be defined. " ' . ^ ,* ' 

SUGGESTIONS y^'OR' F URTHER RESEARCH / 

^ ^ , / . • . 

' ^ The mbst direct continuation of^t^is research would 

be the investigation of the relationships between gr.ades 

and standardized test scores in 'other subjects. For each 

of these subjects the predictive ability of e^ch roodel can 

be tested. If Possible the study should use a larger sample 

which would" allow for the testing of moderator- variables.. 

Beyond the immediate question of the degree to which ^ 

teacher measures are related to academic achievement* the . 

far more important question is the degree to which academic 

achievement, however* measured, is related to later life. ' 

Given the enormous effort ,in both time and money that is 

devoted to the schools and to research and development 

activities associated with them, knowing what the schools 

and the various subjects within the curriculum contribute 

to later life s^ems of^c'ritical importance. M\:^ltivariate 

f V * 0 ' 

prediction studies using either grades or standardized test 
Scores to predict non-academic success could isolate , the; 
effect of each component of the eurriculuin. 
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TABLE 1 ■ 
Variance Accounted For 



Model 



Grade' and ^ Class 
Non-Linear 

Full Linear 

Improvement- 
Interaction 

Improvement .w/o 
Interaction 

Application^-^ ' 
Interation 

Application w/o^ 
Interacti-on 

Class and Gxade- 
-Linear Interaction 

Class and Grade-. 
Linear w/6 Int 



Number of Predictors 



.6157* . 

. 51 27 * ♦ 

♦.^695** 
.467^*** 



.12 

8 

6 

6 
5 



•p .05 **p .01 



p .001 



r 



* TAULE 2 
Signif icajice Test 



Full'Yiociel/ 



Difference in r'' 



F Ratio 



Probability 



Improvement- 
Interaction 

Improvement w/o 
interaction 

Application- 
interaction 

Application w/o 
Interaction 



0 

^ .0164 
.0021 



1.202 
2.41.^ 

.597 
.102 



.3175 
■'.1288 
• .6818 
.7502 



Note: All models tested against Class and Grade-Linear 
Interation. 



TABLE 5 
Contribution Coefficient 



Predictor 


Contribution 


Application 


.42 


Improvement 


.78- 


Grade . Level 


.58 


Conduct 


• 

.12 


Class 


.70 . 


App.lic at ion-Class 
Interaction 


.05 


1 

Improvement-Class ' 
Interaction 




Grade Level-Class 
-Jntieraction^ • 


/ 






* 

• 

• 


• 



\ 



. . TAi3LE 4 

« 

Test - Predictor Correlations 



Pre^dictor \ 


Concept 


Computation 


*Total 


i^ppiication 


-.15 >^ 




.22 


Improvement 


.39 ' . 


. .43 • 


-.42 


Grade Lev.el ^ 




• .35 


.>2 


* 

Conduct 


'.01 


.16 . ' 

• 


.06 


Class 


.24 

• 


.56 


, .38 


'Application-ClasG 
Interaction \^ 




i ■ .O* 

1 ■ ■ • . 


. .03 


Iniprovement-Clas!^ 
IpLtVraction , 


.27 


• 


' .26 


Grade Level-Class 
Interaction 


.17 


.21 


.17 

^1 
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